Search CORE

18 research outputs found

Algorithmic Analysis of Complex Audio Scenes

Author: Bardeli Rolf
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and sharing this data has been available. We describe a distributed infrastructure for searching, sharing and annotating animal sound collections collaboratively, which we have developed in this context. Although searching animal sound databases by metadata gives good results for many applications, annotating all occurences of a specific sound is beyond the scope of human annotators. Moreover, finding similar vocalisations to that of an example is not feasible by using only metadata. We therefore propose an algorithm for content-based similarity search in animal sound databases. Based on principles of image processing, we develop suitable features for the description of animal sounds. We enhance a concept for content-based multimedia retrieval by a ranking scheme which makes it an efficient tool for similarity search. One of the main sources of complexity in natural audio scenes, and the most difficult problem for pattern recognition, is the large number of sound sources which are active at the same time. We therefore examine methods for source separation based on microphone arrays. In particular, we propose an algorithm for the extraction of simpler components from complex audio scenes based on a sound complexity measure. Finally, we introduce pattern recognition algorithms for the vocalisations of a number of bird species. Some of these species are interesting for reasons of nature conservation, while one of the species serves as a prototype for song birds with strongly structured songs.Algorithmische Analyse Komplexer Audioszenen In dieser Arbeit untersuchen wir das Problem der Analyse komplexer Audioszenen mit besonderem Augenmerk auf natürliche Audioszenen. Eine der treibenden Zielsetzungen hinter dieser Arbeit ist es Werkzeuge zu entwickeln, die es erlauben ein auf Lautäußerungen basierendes Monitoring von Tierarten in Zielregionen durchzuführen. Diese Aufgabenstellung, die häufig in der Evaluation von Naturschutzmaßnahmen auftritt, führt zu einer Anzahl von Unterproblemen innerhalb der Audioszenen-Analyse. Eine wichtige Voraussetzung um Mustererkennungs-Algorithmen für Tierstimmen entwickeln zu können, ist die Verfügbarkeit großer Sammlungen von Aufnahmen von Tierstimmen. Eine solche Sammlung aufzubauen liegt jenseits der Möglichkeiten eines einzelnen Forschers und wir verwenden daher Daten des Tierstimmenarchivs der Humboldt Universität Berlin. Obwohl eine große Anzahl gut annotierter Aufnahmen in diesem Archiv in digitaler Form vorlagen, gab es nur wenig unterstützende Infrastruktur um diese Daten durchsuchen und verteilen zu können. Wir beschreiben eine verteilte Infrastruktur, mit deren Hilfe es möglich ist Tierstimmen-Sammlungen zu durchsuchen, sowie gemeinsam zu verwenden und zu annotieren, die wir in diesem Kontext entwickelt haben. Obwohl das Durchsuchen von Tierstimmen-Datenbank anhand von Metadaten für viele Anwendungen gute Ergebnisse liefert, liegt es jenseits der Möglichkeiten menschlicher Annotatoren alle Vorkommen eines bestimmten Geräuschs zu annotieren. Darüber hinaus ist es nicht möglich einem Beispiel ähnlich klingende Geräusche nur anhand von Metadaten zu finden. Deshalb schlagen wir einen Algorithmus zur inhaltsbasierten Ähnlichkeitssuche in Tierstimmen-Datenbanken vor. Ausgehend von Methoden der Bildverarbeitung entwickeln wir geeignete Merkmale für die Beschreibung von Tierstimmen. Wir erweitern ein Konzept zur inhaltsbasierten Multimedia-Suche um ein Ranking-Schema, dass dieses zu einem effizienten Werkzeug für die Ähnlichkeitssuche macht. Eine der grundlegenden Quellen von Komplexität in natürlichen Audioszenen, und das schwierigste Problem für die Mustererkennung, stellt die hohe Anzahl gleichzeitig aktiver Geräuschquellen dar. Deshalb untersuchen wir Methoden zur Quellentrennung, die auf Mikrofon-Arrays basieren. Insbesondere schlagen wir einen Algorithmus zur Extraktion einfacherer Komponenten aus komplexen Audioszenen vor, der auf einem Maß für die Komplexität von Audioaufnahmen beruht. Schließlich führen wir Mustererkennungs-Algorithmen für die Lautäußerungen einer Reihe von Vogelarten ein. Einige dieser Arten sind aus Gründen des Naturschutzes interessant, während eine Art als Prototyp für Singvögel mit stark strukturierten Gesängen dient

bonndoc – Der Publikationsserver der Universität Bonn

Formalizing the Problem of Music Description

Author: Bardeli Rolf
Emiya Valentin
Langlois Thibault
Sturm Bob L.
Publication venue: ISMIR
Publication date: 01/01/2015
Field of study

VBN

A covering problem that is easy for trees but NP-complete for trivalent graphs

Author: Bardeli Rolf
Clausen Michael
Ribbrock Andreas
Publication venue: Elsevier B.V.
Publication date
Field of study

AbstractBy definition, a P2-graph Γ is an undirected graph in which every vertex is contained in a path of length two. For such a graph, pc(Γ) denotes the minimum number of paths of length two that cover all n vertices of Γ. We prove that ⌈n/3⌉≤pc(Γ)≤⌊n/2⌋ and show that these upper and lower bounds are tight. Furthermore we show that every connected P2-graph Γ contains a spanning tree T such that pc(Γ)=pc(T). We present a linear time algorithm that produces optimal 2-path covers for trees. This is contrasted by the result that the decision problem pc(Γ)=?n/3 is NP-complete for trivalent graphs. This graph theoretical problem originates from the task of searching a large database of biological molecules such as the Protein Data Bank (PDB) by content

Elsevier - Publisher Connector

Methods for the automatic recording of bird calls and songs in field ornithology

Author: Bardeli Rolf
Frommolt Karl-Heinz
Hill Reinhold
Hüppop Ommo
Koch Martina
Specht Raimund
Tauchert Klaus-Henry
Publication venue
Publication date: 27/11/2012
Field of study

Der gegenwärtige Kenntnisstand über automatisierte Methoden zur akustischen Erfassung von Rufen und Gesängen von Vögeln wird dargelegt. Die Grundlage für eine automatisierte Erfassung bilden Langzeitaufzeichnungen. Es wird der Frage nachgegangen, inwiefern Tonaufzeichnungen für eine qualitative und auch quantitative Analyse von Vogelbeständen geeignet sind. Spezielles Augenmerk wird autonomen Aufzeichnungsmethoden und der Auswertung von Langzeitaufzeichnungen unter Nutzung von Algorithmen der akustischen Mustererkennung gewidmet. Sinnvolle Einsatzszenarien für automatisierte Methoden im Rahmen avifaunistischer Feldforschung sind die Erfassung des nächtlichen Vogelzuges, die Erfassung nachtaktiver Brutvogelarten und die Datenerhebung in Kernzonen von Schutzgebieten.This review presents our current knowledge on automated methods for acoustic recording of calls and songs of birds. Acoustic long-term recordings can serve as a basis for an automated bird census. We stress the question of whether sound recordings are suitable for qualitative and quantitative analysis of bird populations. Special attention is devoted to autonomous recording methods and the evaluation of long-term recordings by use of acoustic pattern recognition algorithms. Realistic scenarios for the use of automated methods in field ornithology we see in the investigation of nocturnal bird migration, the census of nocturnal bird species, and data collection in core areas of nature reserves

Hochschulschriftenserver - Universität Frankfurt am Main

CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

Author: Bardeli Rolf
Boujemaa Nozha
Compañó Ramón
Doch Christoph
Geurts Joost
Gouraud Henri
Joly Alexis
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Schreer Oliver
Sebe Nicu
Snoek Cees
Publication venue: Chorus Project Consortium
Publication date: 01/01/2008
Field of study

After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Robust Identification of Time-Scaled Audio

Author: Frank Kurth
Rolf Bardeli
Publication venue
Publication date
Field of study

Automatic identification of audio titles on radio broadcasts is a first step towards automatic annotation of radio programmes. Systems designed for the purpose of identification have to deal with a variety of postprocessing potentially imposed on audio material at the radio stations. One of the more difficult techniques to be handled is time-scaling, i.e., the variation of playback speed. In this paper we propose a robust fingerprinting technique designed for the identification of time-scaled audio data. To allow for fast timescale invariant audio identification, the extracted fingerprints are used as an input to an algebraic indexing technique that has already been successfully applied to the task of audio identification

CiteSeerX

Automatic sentence boundary detection for German broadcast news

Author: Bardeli Rolf
Dzhambazov Georgi
Publication venue
Publication date
Field of study

In this work we aim at enriching the transcript of an automatic speech recognition system with punctuation by automatically detecting sentence ends. We make use of a simple word-based language model and combine it with a decision tree for the acoustic features of speech. The focus lies on selecting robust acoustic features that reflect the prosodic characteristics of the German language in a most optimal way. We arrive at a Sentence Unit Error Rate of 54 compared to the state-of-the art rate for English of 61, by applying a comparable detection system. This is a sound indication that prosody has a stronger cue on perception of sentence boundaries for German than for English. Our work is, to our knowledge, the first system developed for sentence boundary detection for the broadcast news dom ain for German language. Our results can therefore serve as a baseline for further studies in this scenario

Fraunhofer-ePrints

DiSCo - A speaker and speech recognition evaluation corpus for challenging problems in the broadcast domain

Author: Bardeli Rolf
Baum Doris
Samlowski Barbara
Schneider Daniel
Winkler Thomas
Publication venue
Publication date: 01/01/2009
Field of study

Baum D, Samlowski B, Winkler T, Bardeli R, Schneider D. DiSCo - A speaker and speech recognition evaluation corpus for challenging problems in the broadcast domain. In: GSCL Symposium Sprachtechnologie und EHumanities. 2009: 1-9.Systems for speech and speaker recognition already achieve low error rates when applied to high-quality audiovisual broadcast data, such as news shows recorded in a studio environment. Several evaluation corpora exist for this domain in various languages. However, in actual applications for broadcast data analysis, the data requirements are more complex. There are many data types beyond the planned speech of the news anchorperson. For example, interesting live recordings from prominent politicians are often recorded in an environment with challenging acoustic properties. Discussions typically expose highly spontaneous speech, with different speakers talking at the same time. The performance of standard approaches to speech and speaker recognition typically deteriorates under such data characteristics, and dedicated techniques have to be developed to handle these problems. Corresponding evaluation corpora are needed which reflect the challenging conditions of the actual applications. Currently, no German evaluation corpus is available which covers the required acoustic conditions and diverse language properties. This contribution describes the design of a new speaker and speech recognition evaluation corpus for the broadcast domain, reflecting the typical problems encountered in actual applications

Publications at Bielefeld University

Speech recognition as a retrieval problem

Author: Bardeli Rolf
Rieber Joscha Simon
Publication venue
Publication date
Field of study

Common approaches to automatic speech recognition (ASR) are based on training statistical models for the acoustics of speech. In our work, a retrieval-based ASR system is developed that does not rely on training and thus provides more flexible application. It is based on a set of known reference word utterances for each possibly occurring word in a test string. A test word string is identified by finding the most similar reference for each word by using an approach based on dynamic time warping (DTW). The DTW variant suitable for recognizing strings of connected words is called level-building DTW, proposed by Myers and Rabiner in 1981. It is using a level-wise iteration to match each word in the test utterance with the most similar reference. In our work, an ASR system for connected digit recognition based on level-building DTW is developed, evaluated and compared with a state-of-the-art HMM recognizer

Fraunhofer-ePrints